Goto

Collaborating Authors

 Tekirdag Province


Hyperparameter Tuning Through Pessimistic Bilevel Optimization

Ustun, Meltem Apaydin, Xu, Liang, Zeng, Bo, Qian, Xiaoning

arXiv.org Artificial Intelligence

Automated hyperparameter search in machine learning, especially for deep learning models, is typically formulated as a bilevel optimization problem, with hyperparameter values determined by the upper level and the model learning achieved by the lower-level problem. Most of the existing bilevel optimization solutions either assume the uniqueness of the optimal training model given hyperparameters or adopt an optimistic view when the non-uniqueness issue emerges. Potential model uncertainty may arise when training complex models with limited data, especially when the uniqueness assumption is violated. Thus, the suitability of the optimistic view underlying current bilevel hyperparameter optimization solutions is questionable. In this paper, we propose pessimistic bilevel hyperparameter optimization to assure appropriate outer-level hyperparameters to better generalize the inner-level learned models, by explicitly incorporating potential uncertainty of the inner-level solution set. To solve the resulting computationally challenging pessimistic bilevel optimization problem, we develop a novel relaxation-based approximation method. It derives pessimistic solutions with more robust prediction models. In our empirical studies of automated hyperparameter search for binary linear classifiers, pessimistic solutions have demonstrated better prediction performances than optimistic counterparts when we have limited training data or perturbed testing data, showing the necessity of considering pessimistic solutions besides existing optimistic ones.


Multi-view Disentanglement for Reinforcement Learning with Multiple Cameras

Dunion, Mhairi, Albrecht, Stefano V.

arXiv.org Artificial Intelligence

The performance of image-based Reinforcement Learning (RL) agents can vary depending on the position of the camera used to capture the images. Training on multiple cameras simultaneously, including a first-person egocentric camera, can leverage information from different camera perspectives to improve the performance of RL. However, hardware constraints may limit the availability of multiple cameras in real-world deployment. Additionally, cameras may become damaged in the real-world preventing access to all cameras that were used during training. To overcome these hardware constraints, we propose Multi-View Disentanglement (MVD), which uses multiple cameras to learn a policy that is robust to a reduction in the number of cameras to generalise to any single camera from the training set. Our approach is a self-supervised auxiliary task for RL that learns a disentangled representation from multiple cameras, with a shared representation that is aligned across all cameras to allow generalisation to a single camera, and a private representation that is camera-specific. We show experimentally that an RL agent trained on a single third-person camera is unable to learn an optimal policy in many control tasks; but, our approach, benefiting from multiple cameras during training, is able to solve the task using only the same single third-person camera.


Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking

Acikgoz, Emre Can, Erdogan, Mete, Yuret, Deniz

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are becoming crucial across various fields, emphasizing the urgency for high-quality models in underrepresented languages. This study explores the unique challenges faced by low-resource languages, such as data scarcity, model selection, evaluation, and computational limitations, with a special focus on Turkish. We conduct an in-depth analysis to evaluate the impact of training strategies, model choices, and data availability on the performance of LLMs designed for underrepresented languages. Our approach includes two methodologies: (i) adapting existing LLMs originally pretrained in English to understand Turkish, and (ii) developing a model from the ground up using Turkish pretraining data, both supplemented with supervised fine-tuning on a novel Turkish instruction-tuning dataset aimed at enhancing reasoning capabilities. The relative performance of these methods is evaluated through the creation of a new leaderboard for Turkish LLMs, featuring benchmarks that assess different reasoning and knowledge skills. Furthermore, we conducted experiments on data and model scaling, both during pretraining and fine-tuning, simultaneously emphasizing the capacity for knowledge transfer across languages and addressing the challenges of catastrophic forgetting encountered during fine-tuning on a different language. Our goal is to offer a detailed guide for advancing the LLM framework in low-resource linguistic contexts, thereby making natural language processing (NLP) benefits more globally accessible.


Visual-Policy Learning through Multi-Camera View to Single-Camera View Knowledge Distillation for Robot Manipulation Tasks

Acar, Cihan, Binici, Kuluhan, Tekirdağ, Alp, Wu, Yan

arXiv.org Artificial Intelligence

The use of multi-camera views simultaneously has been shown to improve the generalization capabilities and performance of visual policies. However, the hardware cost and design constraints in real-world scenarios can potentially make it challenging to use multiple cameras. In this study, we present a novel approach to enhance the generalization performance of vision-based Reinforcement Learning (RL) algorithms for robotic manipulation tasks. Our proposed method involves utilizing a technique known as knowledge distillation, in which a pre-trained ``teacher'' policy trained with multiple camera viewpoints guides a ``student'' policy in learning from a single camera viewpoint. To enhance the student policy's robustness against camera location perturbations, it is trained using data augmentation and extreme viewpoint changes. As a result, the student policy learns robust visual features that allow it to locate the object of interest accurately and consistently, regardless of the camera viewpoint. The efficacy and efficiency of the proposed method were evaluated both in simulation and real-world environments. The results demonstrate that the single-view visual student policy can successfully learn to grasp and lift a challenging object, which was not possible with a single-view policy alone. Furthermore, the student policy demonstrates zero-shot transfer capability, where it can successfully grasp and lift objects in real-world scenarios for unseen visual configurations.


Translation Aligned Sentence Embeddings for Turkish Language

Unlu, Eren, Ciftci, Unver

arXiv.org Artificial Intelligence

Due to the limited availability of high quality datasets for training sentence embeddings in Turkish, we propose a training methodology and a regimen to develop a sentence embedding model. The central idea is simple but effective : is to fine-tune a pretrained encoder-decoder model in two consecutive stages, where the first stage involves aligning the embedding space with translation pairs. Thanks to this alignment, the prowess of the main model can be better projected onto the target language in a sentence embedding setting where it can be fine-tuned with high accuracy in short duration with limited target language dataset.


Entity Embeddings : Perspectives Towards an Omni-Modality Era for Large Language Models

Unlu, Eren, Ciftci, Unver

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are evolving to integrate multiple modalities, such as text, image, and audio into a unified linguistic space. We envision a future direction based on this framework where conceptual entities defined in sequences of text can also be imagined as modalities. Such a formulation has the potential to overcome the cognitive and computational limitations of current models. Several illustrative examples of such potential implicit modalities are given. Along with vast promises of the hypothesized structure, expected challenges are discussed as well.



Transfer Learning for Electricity Price Forecasting

Gunduz, Salih, Ugurlu, Umut, Oksuz, Ilkay

arXiv.org Machine Learning

The task has been studied in different markets separately and learning interdependent information in between different markets is an understudied field. Recently, deep learning methods have showcased superior performance in predicting electricity prices [1]. In particular, recurrent neural networks have been able to learn sequential information in time-series type data sets [2]. Most of the the literature on the application of neural networks for electricity price forecasting has relied on single market data and available large amounts of data from different markets have not been utilized. Transfer Learning is a major tool to improve the performance on image classification problems. The networks can be trained on similar problems before finally being trained on the final problem to leverage from the data to the fullest. In this paper, we utilize the concept of transfer learning for electricity price forecasting by using data from five different markets. Our major novelties can be listed as: 1. We investigate the different ways to combine data from different elec-2 tricity markets, when training neural networks, 2. We propose the transfer learning scheme to leverage from different market data, when training recurrent neural networks (RNN) for the task of price prediction.


Forecast monsters fed by big data

#artificialintelligence

Think about how much we need cognitive technologies that enable us to make accurate estimates. There is a need for knowledge to save the potato from speculators, to predict elections and the weather accurately, i.e. cognitive computing. Will it hail, when will it hail, how long will it last, how big will the hailstones be and where will it hail in Istanbul from Tekirdağ to Kocaeli? Not Nostradamus but big data and cognitive computing technology lets you find the right answers. Weather forecasting It was not in vain that IBM bought The Weather Company, which has the world's most sensitive, precise and reliable weather data, at the beginning of 2016.


Adaptive Mixtures of Factor Analyzers

Kaya, Heysem, Salah, Albert Ali

arXiv.org Machine Learning

A mixture of factor analyzers is a semi-parametric density estimator that generalizes the well-known mixtures of Gaussians model by allowing each Gaussian in the mixture to be represented in a different lower-dimensional manifold. This paper presents a robust and parsimonious model selection algorithm for training a mixture of factor analyzers, carrying out simultaneous clustering and locally linear, globally nonlinear dimensionality reduction. Permitting different number of factors per mixture component, the algorithm adapts the model complexity to the data complexity. We compare the proposed algorithm with related automatic model selection algorithms on a number of benchmarks. The results indicate the effectiveness of this fast and robust approach in clustering, manifold learning and class-conditional modeling.